I am going to make some plotly plots based on Instacart.
library(tidyverse)
library(p8105.datasets)
library(plotly)
library(RColorBrewer)
Let’s get some data and preprocess it.
data("instacart")
set.seed(12345)
instacart_sub = instacart %>%
sample_frac(0.25)
There are 1384617 observations in the instacart dataset.
To get a smaller dataset, I would like to take a 25% random sample
(instacart_sub) from it.
instacart_sub %>%
group_by(department, aisle) %>%
summarise(
order_count = n()
) %>%
arrange(order_count) %>%
ungroup() %>%
head(20) %>%
mutate(
aisle = str_to_title(aisle),
department = str_to_title(department),
aisle = fct_reorder(aisle, order_count)
) %>%
plot_ly(
x = ~ order_count,
y = ~ aisle,
color = ~ department,
type = "bar",
colors = "Accent"
) %>%
layout(
title = "Top 20 Least Popular Aisles",
xaxis = list(title = "Number of Ordered Times"),
yaxis = list(title = "Name of Aisle"),
legend = list(title = list(text = "Department")),
autosize = FALSE
)
This bar plot shows the top 20 least popular aisles from all
21departments. For instance, products from the Beauty aisle
are only ordered 63 times in this 25% random sample of
instacart dataset.
instacart_sub %>%
mutate(
order_dow = order_dow + 1,
order_dow = lubridate::wday(order_dow, label = TRUE, abbr = FALSE)
) %>%
plot_ly(
x = ~ order_dow,
y = ~ order_hour_of_day,
type = "box"
) %>%
layout(
title = "Distribution of Ordered Time on Each Day of Week",
xaxis = list(title = "Day of Week"),
yaxis = list(title = "Hour of Day")
)
This box plot shows the distribution of hour of day that items were ordered on each day of week. We can tell that people tend to place order relatively earlier on Mondays, compared to the other days.